Texture sampling
Each texture processing cluster inside the GT200 core has access to its own texture sampling unit, meaning there are ten texture samplers in a fully-functional GeForce GTX 280 graphics processing unit.
These texture sampling units are able to address and also apply bilinear filtering to eight textures per clock; alternatively, each texture sampler can address and filter four 2:1 anisotropic filtered or four FP16 bilinear-filtered pixels per clock.
This is exactly the same functionality exposed inside the G92 core, but what's interesting though is that Nvidia claims that GT200's texture units are up to 22 percent more efficient than G92's and as a result it can get closer to the bilinear filtered peak rate. The company says that this is thanks in large part to some tweaks to the more efficient scheduler it has employed in GT200.
What's interesting is that Nvidia has tweaked the shader-to-texture ratio on GT200 – it said it believes the architecture is now better tuned for today's (and tomorrow's) 3D graphics loads. There are still eight texture units per texture processing cluster, so the texture sampler hardware hasn't been cut down. Instead, Nvidia has beefed up the number of shader processors per texture sampler in GT200 – there are now 24 shader processors (three streaming multiprocessors) per texture processing cluster compared to 16 SPs per TPC in G80 and G92.
-
Nvidia GeForce 9800 GX2 1GB
-
Nvidia GeForce GTX 280 1GB
-
Nvidia GeForce 9800 GTX 512MB
-
ATI Radeon HD 3870 X2 1GB
-
Nvidia GeForce 8800 Ultra 768MB
0
10000
20000
30000
40000
50000
60000
70000
80000
Mtexels/sec
-
Single Texture
-
Multi Texture
-
Theoretical Peak
GT200's texturing efficiency is pretty good compared to the GeForce 9800 GTX – it achieves 93 percent of peak in 3DMark06's multi-texturing test, while on the other hand the GeForce 9800 GTX only hits 73.5 percent of its peak throughput. You're probably wondering why the GeForce 8800 Ultra hits some exceptional texture throughput compared to its peak rate.
The reason behind this is that G80's texture samplers are limited by the number of addresses they can handle and, as a result, the texture filtering units aren't being fully utilised. As a result, it is able to achieve near its peak throughput, but that doesn't take the (potentially) idle filtering hardware into account.
-
Nvidia GeForce 9800 GX2 1GB
-
ATI Radeon HD 3870 X2 1GB
-
Nvidia GeForce GTX 280 1GB
-
Nvidia GeForce 9800 GTX 512MB
-
Nvidia GeForce 8800 Ultra 768MB
Gtexels/sec
3DMark Vantage's test is a strange one in many respects because it has incorrectly interpreted the results from frames per second into gigatexels per second. This test is focused on FP16 bilinear filtering, which runs at half speed on Nvidia's hardware (it runs at full speed on the Radeon HD 3870 X2), so the theoretical peak for the GeForce GTX 280 is around 24 gigatexels per second. As you can see from the numbers in the graph, they're out by around a factor of 30, but they do scale as expected with the theoretical throughputs.
With D3D RightMark, we see some results that are more in-line with what we'd expect from the texturing hardware in the respective GPUs. The GeForce GTX 280 is roughly twice as fast as the GeForce 8800 Ultra, which roughly lines up with 3DMark06's multi-texture fillrate test – both are integer based though, so that is to be expected.
Want to comment? Please log in.